In lectures 7 and 8, we have introduced time series and ARIMA models. We will discuss some examples in this demo. We will particularly use the R package astsa.
install.packages('astsa')
library(astsa)
library(xts)
set.seed(2020)
First we plot some time series to show the challenges including non-stationarity, cyclic trend and multivariate time series.
layout(matrix(c(1,1,2,3,4,5), nrow=2, ncol=3))
# non-stationary time series
tsplot(globtemp, type="o", ylab="Global Temperature Deviations")
# cyclic pattern
tsplot(soi, ylab="", main="Southern Oscillation Index (SOI)")
tsplot(rec, ylab="", main="Recruitment")
# multivariate time series
ts.plot(fmri1[,2:5], col=1:4, ylab="BOLD", xlab="", main="Cortex")
ts.plot(fmri1[,6:9], col=1:4, ylab="BOLD", xlab="", main="Thalamus & Cerebellum")
mtext("Time (1 pt = 2 sec)", side=1, line=2)
We also investigated the moving average model \(v_t = \frac13 (w_{t-1}+w_t + w_{t+1})\). In R, it can be plotted using filter function.
w = rnorm(500,0,1) # 500 N(0,1) variates
v = filter(w, sides=2, rep(1/3,3)) # moving average
par(mfrow=c(2,1))
tsplot(w, main="white noise")
tsplot(v, ylim=c(-3,3), main="moving average")
And the autoregressive model \(x_t = x_{t-1} -0.9 x_{t-2} + w_t\):
w = rnorm(550,0,1) # 50 extra to avoid startup problems
x = filter(w, filter=c(1,-.9), method="recursive")[-(1:50)]
tsplot(x, main="autoregression")
And the random walk with drift model \(x_t = \delta +x_{t-1} +w_t\):
w = rnorm(200); x = cumsum(w) # two commands in one line
wd = w +.2; xd = cumsum(wd)
tsplot(xd, ylim=c(-5,55), main="random walk", ylab='')
lines(x, col=4)
abline(h=0, col=4, lty=2)
abline(a=0, b=.2, lty=2)
Cross-correlation function (CCF) \(\rho_{xy}(h) = \frac{\gamma_{xy}(h)}{\sqrt{\gamma_x(0)\gamma_y(0))}}\) can be used to measure the dependence of time series. For example, \(y_t = A x_{t-\ell} + w_t\) is a lagged model where \(y_t\) lags behind \(x_t\) for \(\ell\) time units. In R, we could use ccf function to plot CCF.
x = rnorm(100)
y = lag(x, -5) + rnorm(100)
ccf(y, x, ylab='CCovF', type='covariance')
abline(v=0, lty=2)
text(11, .9, 'x leads')
text(-9, .9, 'y leads')
To measure the correlation of a time series to itself, we could consider the auto-correlation function (ACF) \(\rho(h) = \frac{\gamma(h)}{\gamma(0)}\). In R, we could use acf function to compute its sample version. Let’s use the speech data for example.
par(mfrow=c(2,1))
# Speech recording of the syllable aaa ... hhh
tsplot(speech)
# ACF of the speech seires (upto lag 250)
acf1(speech, 250)
## [1] 0.92 0.71 0.42 0.12 -0.14 -0.34 -0.47 -0.54 -0.53 -0.46 -0.31 -0.12 0.08 0.24 0.33 0.34 0.30 0.22 0.14 0.05 -0.02 -0.10 -0.16 -0.20 -0.21 -0.19 -0.13 -0.07 -0.01 0.04 0.07 0.09 0.09 0.09 0.06 0.03 -0.02 -0.07 -0.10 -0.12 -0.12 -0.11 -0.09 -0.07 -0.05 -0.03 -0.01 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 -0.01 -0.03 -0.05 -0.07 -0.09 -0.10 -0.11 -0.10 -0.07 -0.04 -0.01 0.02 0.05 0.06 0.07 0.07 0.06 0.03 -0.01 -0.06 -0.11 -0.15 -0.17 -0.16 -0.13 -0.07 -0.01 0.06 0.13 0.19 0.23 0.24 0.22 0.14 0.03 -0.11 -0.24 -0.34 -0.39 -0.38 -0.32 -0.20 -0.04 0.14 0.33 0.50 0.62 0.67 0.62 0.48 0.30 0.10 -0.08 -0.22 -0.31 -0.36 -0.35 -0.30 -0.22 -0.10 0.02 0.12 0.19 0.22 0.20 0.16 0.11 0.06 0.01 -0.03 -0.08 -0.11 -0.13 -0.13 -0.10 -0.07 -0.04 -0.01 0.02 0.03 0.05 0.05 0.04 0.02 -0.01 -0.04 -0.06 -0.08 -0.08 -0.08 -0.07 -0.06 -0.05 -0.04 -0.02 -0.01 0.00 0.01 0.00 0.00 -0.01 -0.01 0.00 0.00 0.01 0.01 0.00
## [166] -0.01 -0.03 -0.04 -0.05 -0.06 -0.06 -0.07 -0.06 -0.05 -0.03 -0.01 0.02 0.03 0.04 0.04 0.04 0.03 0.01 -0.01 -0.04 -0.07 -0.09 -0.09 -0.08 -0.05 -0.01 0.03 0.06 0.08 0.09 0.10 0.10 0.08 0.03 -0.03 -0.09 -0.15 -0.18 -0.18 -0.15 -0.10 -0.03 0.04 0.13 0.21 0.30 0.35 0.37 0.35 0.28 0.19 0.08 -0.01 -0.10 -0.16 -0.19 -0.20 -0.18 -0.15 -0.09 -0.02 0.04 0.08 0.11 0.12 0.11 0.09 0.07 0.04 0.01 -0.02 -0.05 -0.07 -0.08 -0.08 -0.07 -0.06 -0.04 -0.03 -0.01 0.00 0.01 0.01 0.00 -0.01
We could also use ccf function to estimate the CCF of two time series, e.g. SOI vs Recruitment.
par(mfrow=c(3,1))
acf(soi, 48, main="Southern Oscillation Index")
acf(rec, 48, main="Recruitment")
ccf2(soi, rec, 48, main="SOI vs Recruitment")
To review the classical regression and basic exploratory data analysis in time series analysis, we consider the monthly price (per pound) of a chicken in the US from mid-2001 to mid-2016 (180 months). There is an obvious upward trend in the series, and we might use simple linear regression to estimate that trend by fitting the model \[x_t = \beta_0 + \beta_1 z_t +w_t, \quad z_t=2001\frac{7}{12}, 2001\frac{8}{12}, \cdots 2006\frac{6}{12}\].
summary(fit <- lm(chicken~time(chicken))) # regress price on time
##
## Call:
## lm(formula = chicken ~ time(chicken))
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.7411 -3.4730 0.8251 2.7738 11.5804
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.131e+03 1.624e+02 -43.91 <2e-16 ***
## time(chicken) 3.592e+00 8.084e-02 44.43 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.696 on 178 degrees of freedom
## Multiple R-squared: 0.9173, Adjusted R-squared: 0.9168
## F-statistic: 1974 on 1 and 178 DF, p-value: < 2.2e-16
tsplot(chicken, ylab="cents per pound", col=4, lwd=2)
abline(fit) # add the fitted regression line to the plot
After we remove the fitted linear trend \(\hat\mu_t = -7131 + 3.59t\), we obtain the residual \(\hat y_t = x_t- \hat\mu_t\). Alternatively, we could use differencing to remove the trend and plot \(\nabla x_t\).
par(mfrow=c(2,1))
tsplot(resid(fit), main="detrended")
tsplot(diff(chicken), main="first difference")
Furthermore, we could investigate the autocorrelation of the resulted time series.
par(mfrow=c(3,1)) # plot ACFs
acf(chicken, 48, main="chicken")
acf(resid(fit), 48, main="detrended")
acf(diff(chicken), 48, main="first difference")
When there are multiple independent time series, we can fit classical regression models of the dependent time series and do model selection. Consider a study of the possible effects of temperature and pollution on weekly mortality in Los Angeles County. Note the strong seasonal components in all of the series, corresponding to winter-summer variations and the downward trend in the cardiovascular mortality over the 10-year period.
par(mfrow=c(3,1))
tsplot(cmort, main="Cardiovascular Mortality", ylab="")
tsplot(tempr, main="Temperature", ylab="")
tsplot(part, main="Particulates", ylab="")
# Regression
temp = tempr-mean(tempr) # center temperature
temp2 = temp^2 # square it
trend = time(cmort) # time
fit = lm(cmort~ trend + temp + temp2 + part, na.action=NULL)
summary(fit) # regression results
##
## Call:
## lm(formula = cmort ~ trend + temp + temp2 + part, na.action = NULL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.0760 -4.2153 -0.4878 3.7435 29.2448
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.831e+03 1.996e+02 14.19 < 2e-16 ***
## trend -1.396e+00 1.010e-01 -13.82 < 2e-16 ***
## temp -4.725e-01 3.162e-02 -14.94 < 2e-16 ***
## temp2 2.259e-02 2.827e-03 7.99 9.26e-15 ***
## part 2.554e-01 1.886e-02 13.54 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.385 on 503 degrees of freedom
## Multiple R-squared: 0.5954, Adjusted R-squared: 0.5922
## F-statistic: 185 on 4 and 503 DF, p-value: < 2.2e-16
summary(aov(fit)) # ANOVA table (compare to next line)
## Df Sum Sq Mean Sq F value Pr(>F)
## trend 1 10667 10667 261.62 <2e-16 ***
## temp 1 8607 8607 211.09 <2e-16 ***
## temp2 1 3429 3429 84.09 <2e-16 ***
## part 1 7476 7476 183.36 <2e-16 ***
## Residuals 503 20508 41
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(lm(cmort~cbind(trend, temp, temp2, part)))) # Table 2.1
## Df Sum Sq Mean Sq F value Pr(>F)
## cbind(trend, temp, temp2, part) 4 30178 7545 185 <2e-16 ***
## Residuals 503 20508 41
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
num = length(cmort) # sample size
AIC(fit)/num - log(2*pi) # AIC
## [1] 4.721732
BIC(fit)/num - log(2*pi) # BIC
## [1] 4.771699
# AIC(fit, k=log(num))/num - log(2*pi) # BIC (alt method)
(AICc = log(sum(resid(fit)^2)/num) + (num+5)/(num-5-2)) # AICc
## [1] 4.722062
Recall that the Southern Oscillation Index (SOI) measured at time \(t − 6\) months is associated with the Recruitment series at time \(t\), indicating that the SOI leads the Recruitment series by six months. We consider the following lagged regression \[R_t = \beta_0 + \beta_1 S_{t-6} + w_t\]
Performing lagged regression in R is a little difficult because the series must be aligned prior to running the regression. The easiest way to do this is to create a data frame (that we call fish) using ts.intersect, which aligns the lagged series.
fish = ts.intersect(rec, soiL6=lag(soi,-6), dframe=TRUE)
summary(fit <- lm(rec~soiL6, data=fish, na.action=NULL))
##
## Call:
## lm(formula = rec ~ soiL6, data = fish, na.action = NULL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -65.187 -18.234 0.354 16.580 55.790
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 65.790 1.088 60.47 <2e-16 ***
## soiL6 -44.283 2.781 -15.92 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.5 on 445 degrees of freedom
## Multiple R-squared: 0.3629, Adjusted R-squared: 0.3615
## F-statistic: 253.5 on 1 and 445 DF, p-value: < 2.2e-16
tsplot(fish$rec, ylim=c(0,111)) # plot the data and the fitted values (not shown in text)
lines(fitted(fit), col=2)
In addition to transformation, we consider another preliminary data processing technique that is used for the purpose of visualizing the relations between series at different lags, namely, scatterplot matrices.
Note ACF investigates the linear relationship between \(x_t\) and \(x_{t-h}\). It gives a profile of the linear correlation at all possible lags and shows which values of \(h\) lead to the best predictability. However, there may be nonlinear relationship between \(x_t\) and \(x_{t-h}\) being masked. this leads to the idea of examining scatterplots, possibly \(y_t\) versus \(x_{t-h}\).
Let’s again consider the scatterplots of Southern Oscillation Index (SOI) and Recruitment series.
lag1.plot(soi, 12)
lag2.plot(soi, rec, 8)
We noticed that the relationship is nonlinear and different when SOI is positive or negative. To account for that, we fit the model \[R_t = \beta_0 +\beta_1 S_{t-6} + \beta_2 D_{t-6} + \beta_3 D_{t-6} S_{t-6} +w_t\] where \(D_t\) is a dummy variable that is \(0\) if \(S_t<0\) and \(1\) otherwise, i.e. \(D_t = I(S_t\geq 0)\).
dummy = ifelse(soi<0, 0, 1)
fish = ts.intersect(rec, soiL6=lag(soi,-6), dL6=lag(dummy,-6), dframe=TRUE)
summary(fit <- lm(rec~ soiL6*dL6, data=fish, na.action=NULL))
##
## Call:
## lm(formula = rec ~ soiL6 * dL6, data = fish, na.action = NULL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -63.291 -15.821 2.224 15.791 61.788
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 74.479 2.865 25.998 < 2e-16 ***
## soiL6 -15.358 7.401 -2.075 0.0386 *
## dL6 -1.139 3.711 -0.307 0.7590
## soiL6:dL6 -51.244 9.523 -5.381 1.2e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.84 on 443 degrees of freedom
## Multiple R-squared: 0.4024, Adjusted R-squared: 0.3984
## F-statistic: 99.43 on 3 and 443 DF, p-value: < 2.2e-16
layout(matrix(c(1,1,2,3),nrow=2,ncol=2,byrow=T))
attach(fish)
plot(soiL6, rec)
lines(lowess(soiL6, rec), col=4, lwd=2)
points(soiL6, fitted(fit), pch='+', col=2)
tsplot(resid(fit)) # not shown ...
acf(resid(fit)) # ... but obviously not noise
Smoothing is useful in discovering certain traits in a time series, such as long-term trend and seasonal components. In particular, we could use moving average as a filter to smooth monthly SOI series with \[m_t = \sum_{j=-k}^k a_j x_{t-j}\] with \(a_0 = a_{\pm 1}=\cdots= a_{\pm 5}=1/2\) and \(a_{\pm 6} = 1/24\); \(k=6\).
wgts = c(.5, rep(1,11), .5)/12
soif = filter(soi, sides=2, filter=wgts)
tsplot(soi)
lines(soif, lwd=2, col=4)
par(fig = c(.75, 1, .75, 1), new = TRUE) # the insert
nwgts = c(rep(0,20), wgts, rep(0,20))
plot(nwgts, type="l", ylim = c(-.02,.1), xaxt='n', yaxt='n', ann=FALSE)
Alternatively, one can consider the kernel smoothing using ksmooth function or the locally weighted scatterplot smoothing (lowess) using lowess function.
# Gaussian kernel smoother
tsplot(soi)
lines(ksmooth(time(soi), soi, "normal", bandwidth=1), lwd=2, col=4)
par(fig = c(.75, 1, .75, 1), new = TRUE) # the insert
gauss = function(x) { 1/sqrt(2*pi) * exp(-(x^2)/2) }
x = seq(from = -3, to = 3, by = 0.001)
plot(x, gauss(x), type ="l", ylim=c(-.02,.45), xaxt='n', yaxt='n', ann=FALSE)
# lowess
tsplot(soi)
lines(lowess(soi, f=.05), lwd=2, col=4) # El Nino cycle
lines(lowess(soi), lty=2, lwd=2, col=2) # trend (with default span)
In this section, we review AR(p), MA(q), ARMA(p, q), ARIMA(p, d, q) models.
First, we look at two simulated AR(1) models with \(\phi=0.9\) and \(\phi=-0.9\) respectively.
par(mfrow=c(2,1))
# in the expressions below, ~ is a space and == is equal
tsplot(arima.sim(list(order=c(1,0,0), ar=.9), n=100), ylab="x", main=(expression(AR(1)~~~phi==+.9)))
tsplot(arima.sim(list(order=c(1,0,0), ar=-.9), n=100), ylab="x", main=(expression(AR(1)~~~phi==-.9)))
Next, we introduced two simulated MA(1) models with \(\theta=0.9\) and \(\theta=-0.9\) resepectively.
par(mfrow=c(2,1))
tsplot(arima.sim(list(order=c(0,0,1), ma=.9), n=100), ylab="x", main=(expression(MA(1)~~~theta==+.9)))
tsplot(arima.sim(list(order=c(0,0,1), ma=-.9), n=100), ylab="x", main=(expression(MA(1)~~~theta==-.9)))
We can fit a time series with specified degrees \(p\), \(q\) using arima function. The model below has redundant parameters \[(1+ .96B) x_t = (1+ .95B) w_t \]
set.seed(8675309) # Jenny, I got your number
x = rnorm(150, mean=5) # generate iid N(5,1)s
arima(x, order=c(1,0,1)) # enstimation
##
## Call:
## arima(x = x, order = c(1, 0, 1))
##
## Coefficients:
## ar1 ma1 intercept
## -0.9595 0.9527 5.0462
## s.e. 0.1688 0.1750 0.0727
##
## sigma^2 estimated as 0.7986: log likelihood = -195.98, aic = 399.96
We presented several methods (back substitution, Taylor expansion, mathcing coefficients) to obtain MA representation of ARMA models (\(\psi\) weights in \(x_t = \psi(B) w_t\) and \(\pi\) weights in \(\pi(B) x_t = w_t\) respectively). But in R, we could simply use ARMAtoMA to obtain them.
Consider ARMA(1,1) model \((1-.9B) x_t = (1+.5B)w_t\) which is both causal and invertible. Using the method of matching coefficients and solving the resulted difference equations, we obtained \[\psi = 1.4* 0.9^{j-1}, \quad \pi_j = 1.4 * (-0.5)^{j-1}\]
ARMAtoMA(ar = .9, ma = .5, 10) # first 10 psi-weights
## [1] 1.4000000 1.2600000 1.1340000 1.0206000 0.9185400 0.8266860 0.7440174 0.6696157 0.6026541 0.5423887
ARMAtoMA(ar = -.5, ma = -.9, 10) # first 10 pi-weights
## [1] -1.400000000 0.700000000 -0.350000000 0.175000000 -0.087500000 0.043750000 -0.021875000 0.010937500 -0.005468750 0.002734375
In the lecture 8, we mentioned that ACF alone alone tells us little about the orders of dependence for autoregressive type models thus partial autocorrelation function PACF is introduced.
Consider AR(2) model \(x_t = 1.5 x_{t-1} - 0.75 x_{t-2} +w_t\). We can use ARMAacf function to compute ACF and PACF for ARMA seriers.
z = c(1,-1.5,.75) # coefficients of the polynomial
(a = polyroot(z)[1]) # = 1+0.57735i, print one root which is 1 + i 1/sqrt(3)
## [1] 1+0.57735i
arg = Arg(a)/(2*pi) # arg in cycles/pt
1/arg # = 12, the period
## [1] 12
layout(matrix(c(1,1,2,3),nrow=2,ncol=2,byrow=T))
set.seed(8675309) # Jenny, it's me again
ar2 = arima.sim(list(order=c(2,0,0), ar=c(1.5,-.75)), n = 144)
plot(ar2, axes=FALSE, xlab="Time")
axis(2); axis(1, at=seq(0,144,by=12)); box() # work the plot machine
abline(v=seq(0,144,by=12), lty=2)
# ACF
ACF = ARMAacf(ar=c(1.5,-.75), ma=0, 24)
plot(ACF, type="h", xlab="lag")
abline(h=0)
# PACF
PACF = ARMAacf(ar=c(1.5,-.75), ma=0, 24, pacf=TRUE)
plot(PACF, type="h", xlab="lag")
abline(h=0)
Given a time series, e.g. the Recruitment series, we could use acf2 from astsa to print and plot empirical estimates of ACF and PACF simultaneously (Or use built-in function acf and pacf respectively).
acf2(rec, 48) # will produce values and a graphic
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF 0.92 0.78 0.63 0.48 0.36 0.26 0.18 0.13 0.10 0.08 0.06 0.03 -0.04 -0.11 -0.19 -0.24 -0.27 -0.27 -0.24 -0.19 -0.11 -0.03 0.03 0.07 0.06 0.02 -0.01 -0.05 -0.09 -0.11 -0.12 -0.10 -0.05 0.02 0.09 0.12 0.11 0.06 0.01 -0.02 -0.03 -0.03 -0.02 0.01 0.06 0.11 0.17 0.20
## PACF 0.92 -0.44 -0.05 -0.02 0.07 -0.03 -0.03 0.04 0.05 -0.02 -0.06 -0.14 -0.15 -0.05 0.05 0.01 0.01 0.02 0.08 0.11 0.03 -0.03 -0.01 -0.07 -0.12 -0.03 0.05 -0.08 -0.04 -0.03 0.06 0.05 0.15 0.09 -0.04 -0.10 -0.09 -0.02 0.05 0.08 -0.02 -0.01 -0.02 0.05 0.00 0.05 0.08 -0.04
In lecture 8, we briefly mentioned the mean squared one-step-ahead forecast and its prediction error. In practice, we could fit OLS by ar.ols and use predict function to forecast. We could also obtain \((1-\alpha)\) prediction interval of the form \[x_{n+m}^n \pm c_{\frac{\alpha}{2} \sqrt{P_{n+m}^n}\] In the following forecasting problem, we use the Recruitment series. From the previous plot, we see that \(p=2\).
regr = ar.ols(rec, order=2, demean=F, intercept=TRUE) # regression
# regr$asy.se.coef # standard errors
fore = predict(regr, n.ahead=24)
ts.plot(rec, fore$pred, col=1:2, xlim=c(1980,1990), ylab="Recruitment")
U = fore$pred+fore$se; L = fore$pred-fore$se
xx = c(time(U), rev(time(U))); yy = c(L, rev(U))
polygon(xx, yy, border = 8, col = gray(.6, alpha = .2))
lines(fore$pred, type="p", col=2)
In fitting an ARMA(p,q) model to a given time series, we could first use ACF and PACF to determine \(q\) and \(p\) respectively. To estimate the regressive polynomial and moving average polynomial, we could use Yule-Walker estimator (for AR(p), in lecture 8), MLE etc..
Let’s consider the Recruitment series. We use Yule-Walker estimator to fit AR(2) model and compare the prediction with previous OLS model.
rec.yw = ar.yw(rec, order=2)
rec.yw$x.mean # = 62.26278 (mean estimate)
## [1] 62.17732
rec.yw$ar # = 1.3315874, -.4445447 (parameter estimates)
## [1] 1.3313857 -0.4441262
sqrt(diag(rec.yw$asy.var.coef)) # = .04222637, .04222637 (standard errors)
## [1] 0.04252058 0.04252058
rec.yw$var.pred # = 94.79912 (error variance estimate)
## [1] 95.90749
rec.pr = predict(rec.yw, n.ahead=24)
U = rec.pr$pred + rec.pr$se
L = rec.pr$pred - rec.pr$se
minx = min(rec,L); maxx = max(rec,U)
ts.plot(rec, rec.pr$pred, xlim=c(1980,1990), ylim=c(minx,maxx))
xx = c(time(U), rev(time(U))); yy = c(L, rev(U))
polygon(xx, yy, border = 8, col = gray(.6, alpha = .2))
lines(rec.pr$pred, type="p", col=2)
Lastly, let’s look at ARIMA model. The ARIMA(0,1,1), or IMA(1,1) model is of interest because many economic time series can be successfully modeled this way. In addition, the model leads to a frequently used, and abused, forecasting method called exponentially weighted moving averages (EWMA). We will write the model as \[x_t = x_{t-1} + w_t - \lambda w_{t-1}\] with \(|\lambda|<1\) and \(x_0=0\). By writing \(y_t = \nabla x_t\) we can show that \[x_t = \sum_{j=1}^\infty (1-\lambda)\lambda^{j-1} x_{t-j} + w_t\] The truncated forecasts are \[\tilde x_{n+1}^n = (1-\lambda)x_n + \lambda \tilde x_n^{n-1}, \quad P_{n+m}^n \approx \sigma^2_w [1+(m-1)(1-\lambda)^2]\] In EWMA, the parameter \(1 -\lambda\) is often called the smoothing parameter and larger values of \(\lambda\) lead to smoother forecasts. In R, this is accomplished using the HoltWinters command.
set.seed(666)
x = arima.sim(list(order = c(0,1,1), ma = -0.8), n = 100)
(x.ima = HoltWinters(x, beta=FALSE, gamma=FALSE)) # α is 1-λ here
## Holt-Winters exponential smoothing without trend and without seasonal component.
##
## Call:
## HoltWinters(x = x, beta = FALSE, gamma = FALSE)
##
## Smoothing parameters:
## alpha: 0.1663072
## beta : FALSE
## gamma: FALSE
##
## Coefficients:
## [,1]
## a -2.241533
plot(x.ima)
Finally, we close with fitting an ARIMA model to US GNPData. In this example, we consider the analysis of quarterly U.S. GNP from 1947(1) to 2002(3), n = 223 observations. The data are real U.S. gross national product in billions of chained 1996 dollars and have been seasonally adjusted. The data were obtained from the Federal Reserve Bank of St. Louis http://research.stlouisfed.org/.
# par(mfrow=c(2,1))
plot(gnp)
acf2(gnp, 50)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50]
## ACF 0.99 0.97 0.96 0.94 0.93 0.91 0.90 0.88 0.87 0.85 0.83 0.82 0.80 0.79 0.77 0.76 0.74 0.73 0.72 0.7 0.69 0.68 0.66 0.65 0.64 0.62 0.61 0.60 0.59 0.57 0.56 0.55 0.54 0.52 0.51 0.5 0.49 0.48 0.47 0.45 0.44 0.43 0.42 0.41 0.40 0.38 0.37 0.36 0.35 0.33
## PACF 0.99 0.00 -0.02 0.00 0.00 -0.02 -0.02 -0.02 -0.01 -0.02 0.00 -0.01 0.01 0.00 0.00 0.00 0.01 0.00 -0.01 0.0 -0.01 -0.01 0.00 0.00 0.00 -0.01 0.00 -0.01 -0.01 -0.01 -0.01 -0.01 0.00 -0.01 0.00 0.0 0.00 -0.01 -0.01 -0.01 0.00 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.02 -0.02 -0.01
Consider log transformation and differencing.
# par(mfrow=c(2,1))
gnpgr = diff(log(gnp)) # growth rate
plot(gnpgr)
acf2(gnpgr, 24)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF 0.35 0.19 -0.01 -0.12 -0.17 -0.11 -0.09 -0.04 0.04 0.05 0.03 -0.12 -0.13 -0.10 -0.11 0.05 0.07 0.10 0.06 0.07 -0.09 -0.05 -0.10 -0.05
## PACF 0.35 0.08 -0.11 -0.12 -0.09 0.01 -0.03 -0.02 0.05 0.01 -0.03 -0.17 -0.06 0.02 -0.06 0.10 0.00 0.02 -0.04 0.01 -0.11 0.03 -0.03 0.00
Now we do diagnostics.
# par(mfrow=c(2,1))
sarima(gnpgr, 1, 0, 0) # AR(1)
## initial value -4.589567
## iter 2 value -4.654150
## iter 3 value -4.654150
## iter 4 value -4.654151
## iter 4 value -4.654151
## iter 4 value -4.654151
## final value -4.654151
## converged
## initial value -4.655919
## iter 2 value -4.655921
## iter 3 value -4.655922
## iter 4 value -4.655922
## iter 5 value -4.655922
## iter 5 value -4.655922
## iter 5 value -4.655922
## final value -4.655922
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 xmean
## 0.3467 0.0083
## s.e. 0.0627 0.0010
##
## sigma^2 estimated as 9.03e-05: log likelihood = 718.61, aic = -1431.22
##
## $degrees_of_freedom
## [1] 220
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.3467 0.0627 5.5255 0
## xmean 0.0083 0.0010 8.5398 0
##
## $AIC
## [1] -6.44694
##
## $AICc
## [1] -6.446693
##
## $BIC
## [1] -6.400958
sarima(gnpgr, 0, 0, 2) # MA(2)
## initial value -4.591629
## iter 2 value -4.661095
## iter 3 value -4.662220
## iter 4 value -4.662243
## iter 5 value -4.662243
## iter 6 value -4.662243
## iter 6 value -4.662243
## iter 6 value -4.662243
## final value -4.662243
## converged
## initial value -4.662022
## iter 2 value -4.662023
## iter 2 value -4.662023
## iter 2 value -4.662023
## final value -4.662023
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 ma2 xmean
## 0.3028 0.2035 0.0083
## s.e. 0.0654 0.0644 0.0010
##
## sigma^2 estimated as 8.919e-05: log likelihood = 719.96, aic = -1431.93
##
## $degrees_of_freedom
## [1] 219
##
## $ttable
## Estimate SE t.value p.value
## ma1 0.3028 0.0654 4.6272 0.0000
## ma2 0.2035 0.0644 3.1594 0.0018
## xmean 0.0083 0.0010 8.7178 0.0000
##
## $AIC
## [1] -6.450133
##
## $AICc
## [1] -6.449637
##
## $BIC
## [1] -6.388823
ARMAtoMA(ar=.35, ma=0, 10) # prints psi-weights
## [1] 3.500000e-01 1.225000e-01 4.287500e-02 1.500625e-02 5.252187e-03 1.838266e-03 6.433930e-04 2.251875e-04 7.881564e-05 2.758547e-05